Artificial intelligence-enhanced care pathway planning and scheduling system: content validity assessment of required functionalities

Background Artificial intelligence (AI) and machine learning are transforming the optimization of clinical and patient workflows in healthcare. There is a need for research to specify clinical requirements for AI-enhanced care pathway planning and scheduling systems to improve human–AI interaction in machine learning applications. The aim of this study was to assess content validity and prioritize the most relevant functionalities of an AI-enhanced care pathway planning and scheduling system. Methods A prospective content validity assessment was conducted in five university hospitals in three different countries using an electronic survey. The content of the survey was formed from clinical requirements, which were formulated into generic statements of required AI functionalities. The relevancy of each statement was evaluated using a content validity index. In addition, weighted ranking points were calculated to prioritize the most relevant functionalities of an AI-enhanced care pathway planning and scheduling system. Results A total of 50 responses were received from clinical professionals from three European countries. An item-level content validity index ranged from 0.42 to 0.96. 45% of the generic statements were considered good. The highest ranked functionalities for an AI-enhanced care pathway planning and scheduling system were related to risk assessment, patient profiling, and resources. The highest ranked functionalities for the user interface were related to the explainability of machine learning models. Conclusion This study provided a comprehensive list of functionalities that can be used to design future AI-enhanced solutions and evaluate the designed solutions against requirements. The relevance of statements concerning the AI functionalities were considered somewhat relevant, which might be due to the low level or organizational readiness for AI in healthcare.


Background
Artificial intelligence (AI) and machine learning (ML) are transforming the optimization of clinical and patient workflows in healthcare. The adoption of AI and ML technologies in care pathway planning and scheduling systems can enable early risk assessment *Correspondence: miia.jansson@oulu.fi [1], provide more accurate schedules [2][3][4][5][6][7], reduce blocking [8], and thus, maximize efficiency [9], minimize unnecessary costs [10], and tackle excessive waiting times [11] throughout the care pathway. However, the current care pathway planning and scheduling systems are mostly manual, time-consuming, and resource intensive [8]. In addition, resource allocation in healthcare seems to be backwards looking and based on prior caseloads.
Due to growing demand for healthcare services [12], there is a great need for advanced care planning. Intelligent digital services are usually approached using mathematical modeling and made available to users through dedicated software. Yet, despite its clinical potential, AI is not a universal solution. Uncertainty, organizational readiness, and workflow integration have been the major barriers toward the widespread adoption of medical AI [13,14]. There is a need for research to specify clinical requirements for an AI-enhanced care pathway planning and scheduling system to improve human-AI interaction in ML applications [15].
Human-centered methods can be used to identify end-users' needs for AI-based clinical decision support systems. According to the ISO 9241 − 210 [16], "Human-centered design is an approach to interactive systems development that aims to make systems usable and useful by focusing on the users, their needs and requirements, and by applying human factors/ergonomics, and usability knowledge and techniques". The ISO framework of human-centered design includes interactive and iterative phases to understand and specify the context of use, specify user requirements, design a solution, and evaluate the design against requirements.
This study is a part of a larger research and development project that develops existing digital solutions further together with hospitals, technology providers, and researchers (https:// aicce lerate. eu/). This article, however, focuses solely on specified user requirements to assess content validity and prioritize the most relevant functionalities of an AI-enhanced care pathway planning and scheduling system at the patient, unit, and resource levels.

Study design
A cross-sectional survey was carried out to assess content validity and prioritize the most relevant functionalities of the AI-enhanced care pathway planning and scheduling system. All methods were carried out in accordance with relevant guidelines and regulations (Declaration of Helsinki 2013).

Study procedures
This study was conducted in two phases: (1) domain identification, item generation, and survey formation and (2) content validation and content prioritization.

Phase 1: Domain identification, item generation, and survey formation
The content of the survey was formed from clinical requirements, which were collected from three AICCEL-ERATE Smart Hospital Care Pathway Engine pilots from beforementioned countries (https:// aicce lerate. eu/) by using human-centered methods such as solution charts, user personas, blueprints, and UI-sketches of the solution. These pilots focused on (1) patient flow management for surgical units (Pilot 1); (2) digital care pathway for Parkinson's disease (Pilot 2); and (3) pediatric service delivery (Pilot 3). Clinical requirements were then formulated into generic statements of functionalities and grouped into seven categories: 1. The first category covered factual data on demographics (e.g., country, age, gender, work tenure, profession). In addition, one question related to information and decision-making ("Which of the following best describes how to use the information and knowledge to support your own work") with 6 response alternatives was included in baseline characteristics. After domain identification, item generation, and instrument formation (highly structured, self-administered, multiple-choice questionnaire), six experts (2 anesthesiologists, 2 registered nurses, 1 ICT support specialist, 1 biostatistician) evaluated the relevance, accuracy, clarity, and readability of each statement and identified whether any important issues were lacking. Based on the experts' suggestions, minor revisions were made to the instructions, wording, and content of the survey.
A link to the questionnaire was sent to a contact person in each of five participating university hospitals via email. When the link to the questionnaire was sent to the local contact persons, they were instructed to share the link to suitable experts working at their hospitals (purposive sampling). The email included a brief introductory letter about the current status of healthcare systems and the utilization of AI-enhanced solutions, goals of the AICCELERATE project, progress of the project thus far, and the importance of participation in the survey. The response time was initially one week. Due to the low number of responses, the response time was eventually extended to seven weeks, and the respondents were reminded three times. The completion of the survey took approximately 10-50 min.

Phase 2: Content validation and prioritization
The content validity was assessed following a structured procedure by an expert panel comprising clinical professionals, who were selected for their methodological and/ or clinical expertise. Following Lynn [17], the respondents were asked to estimate the relevance of each generic statement independently on a 4-point Likert scale. As an additional indicator of relevancy, the respondents were asked to prioritize the importance of each generic statement independently on a 5-point scale (from the 1st to 5th most important). The respondents were also encouraged to give open comments and explain additional clinical requirements at the end of the survey.

Data analysis
A content validity assessment was applied to evaluate the relevance of the content. An item-level content validity index (I-CVI) was calculated by dividing the number of responders rating the item as quite or highly relevant by the total number of respondents that gave an acceptable rating. An I-CVI of > 0.83 was considered good [17]. Additionally weighted ranking points (WRP) were calculated: the respondents were asked to rate five (four in categories 4, 5 and 6) most important statements. We recoded the first, second, and third ranked statements by 60, 30, and 10, respectively, to emphasize the differences in importance between the first, second, and thirdranked statements. Finally, the WRP was calculated from the sum of the recoded values [18]. Due to low number of open-ended comments these were not analyzed.

Results
The final survey contained 6 items and 33 statements divided into seven main categories: demographics (6 items), relevancy of unit-level recommendations for operation (13 statements), relevancy of unit level recommendations for patients (6 statements), the importance of patient-level functionalities in the UI (5 statements), the importance of unit level functionalities in the UI (5 statements), and the importance of functionalities in the UI (4 statements).

Category 2: Relevancy of unit-level recommendations for operation
The top three ranked statements were: (1) It is important that AI recognizes if the patients are in risk for adverse events during the care; (2) It is important that AI is able to make individual patient profiles based on previous data; and (3) It is important that AI can suggest the best possible timing for a treatment or visit based on patient risks and the predicted patient flow. The ranking within the category as well as overall ranking can be seen in Table 3.

Category 3: Relevancy of unit-level recommendations for patient-predicted perioperative processes/patient flows
The top three ranked statements were: (1) It is important that AI is able to recognize the possible factors and patterns causing adverse events after care or prolonged need for care; (2) It is important that AI is able to predict available resources for certain time points based on data of internal and external factors; and (3) It is important that AI is able to recognize the days of increased need for care and increased need for resources and the factors causing them. The ranking within the category as well as overall ranking can be seen in Table 3.

Category 4: Importance of patient-level functionalities in the UI
The top three ranked statements were: (1) The user interface updates the visualization of the predicted evolution of the patient's condition based on historical and live patient data; (2) The user interface has a visualization for the predicted patient flow and the reasoning behind it for a particular patient; and (3) The user interface has functionalities for finding the appropriate and right timing for a particular patient's treatment. The ranking within the category as well as overall ranking can be seen in Table 3.

Category 5: Importance of unit-level functionalities in the UI
The top three ranked statements were: (1) The user interface has a view of the recommended order of patient treatment; (2) The user interface updates the visualization of the patient flow for a particular patient during care; and (3) The user interface has a visualization of the general unit/hospital patient flow. The ranking within the category as well as overall ranking can be seen in Table 3.

Category 6: Importance of functionalities in the UI
The top three ranked statements were: (1) The user interface has a functionality to check if limited staff availability is anticipated during the planned treatment time; (2) The user interface has a functionality to check if the hospital capacity is anticipated to be limited during the planned treatment time; and (3) The user interface has a visualization of the predicted hospital capacity as a replicate of    hospital environment. The ranking within the category as well as overall ranking can be seen in Table 3.

Data summary
In general, the I-CVIs ranged from 0.42 to 0.96, and the average CVI was 0.754. 45% of the generic statements were considered good. According to the WRPs, the highest ranked functionalities for the AI-enhanced care pathway planning and scheduling system were related to risk assessment, patient profiling, and resources (Table 4). Correspondingly, the highest ranked functionalities for UI were related to the explainability of ML models.

Discussion
According to our findings, the highest ranked functionalities for AI-enhanced care pathway planning and scheduling systems were related to risk assessment, patient profiling, and the use of shared resources (e.g., personnel, time) at the patient and unit levels. In the literature, AIenhanced scheduling systems have been used to identify modifiable risk factors and to stratify patients into highand low-risk groups to optimize preventive measures in advance [1,19,20]. In addition, intelligent digital services have been used to predict the duration of surgery (DOS) [2][3][4][5][6][7] and the postoperative length of stay [2] to optimize resource management with a high degree of accuracy. The highest ranked functionalities for the UI were related to the explainability of ML models (e.g., predictors, visualization) which is line with newly adopted European Medical Device Regulation (EU 2017/745) [21], the upcoming EU AI Act (2021/0106/COD) [22], and the initiative Digital Health Software Pre-certification (Pre-Cert) Program [23]. In general, uncertainty and distrust of AI predictions have been the major barriers toward the widespread adoption of medical AI [13]. This mistrust is often due to the shortage of model explainability, where Importance of patient-level functionalities in the UI 1. The user interface has a visualization of predicted patient flow and a reasoning behind it for a particular patient (predicted duration of each phase of his/her care path) the relationship between the input and output of the underlying algorithms is unclear [24]. In addition, many organizations are still unfamiliar with digital transformation due to organizational (e.g., motivational readiness, institutional resources, staff attributes, and organizational climate) [14], technical (e.g., limited technology capabilities), and non-technical (e.g., lack of management support) challenges [25]. In this regard, the organization's readiness for the adoption of AI is critical to the success of technological change. According to Jöhnk et al. [25], possible application scenarios of AI are not always directly obvious, and organizations must understand the technology to decide on the intended adoption purpose. For that reason, organizations must continuously assess and develop their AI readiness including strategic alignment (AI-business potentials, customer AI readiness), resources (e.g., financial budget, IT infrastructure), knowledge (e.g., AI awareness, upskilling, AI ethics), culture (e.g., change management, innovativeness), and data (e.g., availability, quality) in the AI adoption process to ensure its successful integration and avoid unnecessary investments and costs [14,25].
In this study, the most relevant functions were related to situational awareness (e.g., the risk of adverse effects, clinical deterioration, or triage) instead of optimal resource usage (e.g., cancellations, overstays, unnecessary laboratory tests etc.) or organizational necessity highlighting both context-and purpose-specific perspectives on AI readiness. In the previous literature, user perceptions toward digital transformation have varied between professional groups, demonstrating the different needs and expectations associated with specific roles and responsibilities [26].
The obtained results of this study highlight the preoperative phase of the surgical path (e.g., personalized risk assessment and optimization). It must be noted, however, that intra-(e.g., the actual DOS) and postoperative phases (e.g., early detection of adverse effects/events) are equally important for the continuum and coordination of care to improve the workflow and reduce blocking, for instance. In addition, explainable AI could also be used to facilitate shared decision-making by helping patients to understand their individual risks and outcomes to select the available treatment options according to individual needs and goals [25]. However, the current use of information systems seems to be backwards looking.
Improving trust requires the development of more transparent ML methods in the near future. In fact, human-AI interaction is warranted to improve transparency in medical AI and thus, support accurate and trustable decision-making [15]. In addition, the expertise of respondents as well as novel research methods should be taken into account. Despite its widespread lack of familiarity, the future of AI is promising. Novel methods are needed to identify "unknown unknowns" in innovative projects.

Limitations
Our study had several limitations related to sampling, participation, and response bias. First, the sample size was limited, but still covered five university hospitals in three different countries. In addition, the response rate of the selected experts was not calculated. Second, the majority of the respondents were physicians. In addition, most of respondents were from Finland, which may have affected the perceived relevance. The survey was however sent to all suitable experts, including all professions. In addition, repeated reminders of the survey were sent by the local contact persons. We were, however, unable to control multiple submissions (if any) and unintended respondents. Third, response bias may also have had an impact on the validity of survey. This kind of research bias was minimized by conducting the survey anonymously. Fourth, the relevance of statements concerning AI functionalities was considered somewhat relevant. This might be due to the low level of organizational readiness for AI in healthcare.

Conclusion
This study provided a comprehensive list of functionalities that can be used to design future AI-enhanced solutions and evaluate the designed solutions against requirements. The relevance of statements was considered somewhat relevant, which might be due to the low level of organizational readiness for AI in healthcare.

The highest ranked statements I -CVI
It is important that AI recognizes if the patients are in risk for adverse events during the care 0.96 It is important that AI is able to predict available resources for certain time points based on data of internal and external factors 0.86 The user interface updates the visualization of predicted evolution of patient's condition based on the historical and live patient data 0.84 The user interface has a view of the recommended order of patients' treatment 0.82 The user interface has a functionality to check if hospital capacity is anticipated to be limited during plan planned treatment time 0.86